Playing repeated Stackelberg games with unknown opponents
نویسندگان
چکیده
In Stackelberg games, a “leader” player first chooses a mixed strategy to commit to, then a “follower” player responds based on the observed leader strategy. Notable strides have been made in scaling up the algorithms for such games, but the problem of finding optimal leader strategies spanning multiple rounds of the game, with a Bayesian prior over unknown follower preferences, has been left unaddressed. Towards remedying this shortcoming we propose a first-of-akind tractable method to compute an optimal plan of leader actions in a repeated game against an unknown follower, assuming that the follower plays myopic best-response in every round. Our approach combines Monte Carlo Tree Search, dealing with leader exploration/exploitation tradeoffs, with a novel technique for the identification and pruning of dominated leader strategies. The method provably finds asymptotically optimal solutions and scales up to real world security games spanning double-digit number of rounds.
منابع مشابه
Learning against sequential opponents in repeated stochastic games
This article considers multiagent algorithms that aim to find the best response in strategic interactions by learning about the game and their opponents from observations. In contrast to many state-of-the-art algorithms that assume repeated interaction with a fixed set of opponents (or even self-play), a learner in the real world is more likely to encounter the same strategic situation with cha...
متن کاملLearning to Play Stackelberg Security Games
As discussed in previous chapters, algorithmic research on Stackelberg Security Games has had a striking real-world impact. But an algorithm that computes an optimal strategy for the defender can only be as good as the game it receives as input, and if that game is an inaccurate model of reality then the output of the algorithm will likewise be flawed. Consequently, researchers have introduced ...
متن کاملLearning in and about Games
We study learning in finitely repeated 2× 2 normal form games, when players have incomplete information about their opponents’ payoffs. In a laboratory experiment we investigate whether players (a) learn the game they are playing, (b) learn to predict the behavior of their opponent, and (c) learn to play according to a Nash equilibrium of the repeated game. Our results show that the success in ...
متن کاملTowards a Fast Detection of Opponents in Repeated Stochastic Games
Multi-agent algorithms aim to find the best response in strategic interactions. While many state-of-the-art algorithms assume repeated interaction with a fixed set of opponents (or even self-play), a learner in the real world is more likely to encounter the same strategic situation with changing counter-parties. This article presents a formal model of such sequential interactions, and a corresp...
متن کاملReputation in Perturbed Repeated Games
The paper analyzes reputation effects in general perturbed repeated games with discounting. If there is some positive prior probability that one of the players is committed to play the same (pure or mixed) action in every period, then this provides a lower bound for her equilibrium payoff in all Nash equilibria. This bound is tight and independent of what other types have positive probability. ...
متن کامل